How to Implement Your Own Estimators
====================================

All the estimators in all packages follow the pre-defined protocols based on their types. All the implementations of algorithms which follow the protocols in ``s3l.base`` can be evaluated as the built-in algorithms by experiment classes. 

The estimators should inherit a base estimator class in ``s3l.base`` according to the type of the estimator you are going to implement. We currently provide five options for you: 

#. ``TransductiveEstimatorwithGraph``, 
#. ``TransductiveEstimatorWOGraph``, 
#. ``InductiveEstimatorWOGraph``, 
#. ``InductiveEstimatorwithGraph``, 
#. ``SupervisedEstimator``. 

As the names indicate, the experiments support supervised learning algorithms, semi-supervised learning algorithms in both inductive and transductive settings with or without graph.

For each estimator class, you must implement the following methods: ``set_params``, ``fit`` and ``predict``.

``set_params`` is the methods to configure the parameters of the estimator objects given a dict storing the values of some parameters. It's called in the experiments to search for the best hyper-parameters. Since the object is used repeatly with different hyper-parameters, **you should make sure that the object is reset as if hadn't been trained**. A common implementation is as follows.

.. code:: python

    def set_params(self, param):
            """Parameter setting function.

            Parameters
            ----------
            param：dict
                Store parameter names and corresponding values {'name': value}.
            """
            if isinstance(param, dict):
                self.__dict__.update(param)
            
            # Codes to reset some properties which may influence the
            # prediction.

``fit`` is the method to train the model given data; ``predict`` is the method to make prediction. The main difference between base classes is the parameters of the ``fit`` and ``predict``. For transductive estimator, the ``predict`` method takes in the indexes of instances to predict (the estimator can see the testing data when training). For inductive estimator, the ``predict`` method takes in the features of instances to predict. ``fit`` method always takes *X*, *y*, *l_ind*, and optional args are supported. For graph-based algorithms, *W* must be provided for ``fit`` method.

For supervised learning algorithm, you can inherit ``SupervisedEstimator`` class. You must rewrite ``__init__`` method and initialize the member *model* as an object of supervised learning model, and *model* must have the following methods:

.. code:: python

    class SupervisedEstimator(BaseEstimator):
        """ Supervised estimator of single-label task.
        """

        @abstractmethod
        def __init__(self):
            super(SupervisedEstimator, self).__init__()
            self.model = None

        def fit(self, X, y, l_ind=None, **kwargs):
            """
            Takes X, y, label_index.
            """
            if l_ind is not None:
                X = X[l_ind, :]
                if y.ndim == 2:
                    y = y[l_ind, :].reshape(-1)
                else:
                    y = y[l_ind]
            self.model.fit(X, y)
        
        def predict(self, X, **kwargs):
            """
            Takes X
            """
            return self.model.predict(X)
        
        def set_params(self, param):
            self.model.set_params(**param)
        
        def predict_proba(self, X):
            return self.model.predict_proba(X)
        
        def predict_log_proba(self, X):
            return self.model.predict_log_proba(X)

``s3l.wrapper.sklearn_wrapper`` guides you to wrap any supervised learning algorithm you like.


Attention
-----------

Sometimes your estimator class may contain *C-language* object member. The object of estimator can be un-serializable when the C object has pointers because the python interpreter has no way to know the details of the memory where the pointer points to. 

The experiment classes run the experiemnts in multi-process mode when ``n_jobs`` is set larger than 1, which requires the estimator object is serializable. An option is to rewrite the ``__getstate__`` and ``__setstate__`` methods to design the way how estimator object is dumped and loaded by ``pickle``. The simplest way is to drop the un-picklable member in ``__getstate__`` and re-initialze it in ``__setstate__``. Here is an example taken from ``s3l.classification.TSVM`` where *self.model* is a C object:

.. code:: python

    def __getstate__(self):
        """
        The model is ctypes objects and contains pointers cannot be pickled.
        So we drop the model when we pickle TSVM.
        """
        state = self.__dict__.copy()
        del state['model']  # manually delete
        return state

    def __setstate__(self, state):
        """
        The model is ctypes objects and contains pointers cannot be pickled.
        So we drop the model when we pickle TSVM.
        """
        self.__dict__.update(state)
        self.model = None  # manually update